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In this paper, we analyze web-downloaded data on people sharing their music library, that we 
use as their individual musical signatures (IMS). The system is represented by a bipartite network, 
nodes being the music groups and the listeners. Music groups audience size behaves like a power 
law, but the individual music library size is an exponential with deviations at small values. In 
order to extract structures from the network, we focus on correlation matrices, that we filter by 
removing the least correlated links. This percolation idea-based method reveals the emergence of 
social communities and music genres, that are visualised by a branching representation. Evidence 
of collective listening habits that do not fit the neat usual genres defined by the music industry 
indicates an alternative way of classifying listeners/music groups. The structure of the network is 
also studied by a more refined method, based upon a random walk exploration of its properties. 
Finally, a personal identification - community imitation model (PICI) for growing bipartite networks 
is outlined, following Potts ingredients. Simulation results do reproduce quite well the empirical 
data. 

PACS numbers: 89.75.Fb, 89. 75. He, 87.23. Ge 



I. INTRODUCTION 



Answering a common question such as "What kind of 
music do you listen to?" is not an easy task and is full 
of hidden informations about oneself. Indeed, music is 
omnipresent in our society and is part of everyday life. 
Moreover, it is well-known in social sciences Q that music 
does not function merely as entertainment, but is deeply 
related to identity-building and community-building. In 
that sense, personal musical choices derive from a subtle 
interplay between cultural framework inheritance, social 
recognition and personality identification. These rein- 
force one's self-image and send messages to others 0. 
Due to the complexity of taste formation and the rich- 
ness of available music, it is tempting to postulate that 
someone's music library is a unique signature of himself 
0. For instance, it is interesting to point to the empir- 
ical study by D'Arcangelo which shows that listeners 
strongly identify with their musical choice, some even go- 
ing so far as to equate their music collection with their 
personality: My personality goes in my iPod, as an in- 
terviewed person claims. Consequently, it is difficult for 
people to recognise themselves in usual music divisions, 
such as punkers versus metal heads, or jazz versus pop 
listeners. And, more commonly, they answer to the above 
question "Everything... a little bit of everything". 

Recently attempts have been made to characterise the 
musical behaviour of individuals and groups using meth- 
ods from quantitative sociology and social network anal- 
ysis. These attempts were made possible because of the 
huge amount of music databases available now, associ- 
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ated with the current transition from materialised mu- 
sic (LP's, CD's...) to computer-based hstening habits 
(iTunes, iPod...). Amongst other studies, let us cite the 
recent empirical work by Voida et al. [q], which shows 
that people form judgements about colleagues based on 
the taste - or lack of taste - revealed by their music col- 
lection, and admit to tailoring their own music library to 
project a particular person. 

The present paper focuses on these musical behaviours 
from a statistical physics and statistical point of view, 
by analysing individual musical signatures and extract- 
ing collective trends. This issue is part of the intense 
ongoing physicist research activity on opinion formation 
7, 8, J„ J^J, itself related to phase transitions and self 
organisation on networks in m El, including cHque 
formation E3|- The characteristics of such phenomena 
depend on the type of network, as well as on the data 
size, thereby questioning universality, in contrast with 
Statistical Mechanics. 

In section 2, we extract empirical data from collabora- 
tive filtering websites, e.g. audioscrobbler.com and mu- 
sicmobs.com. These sites possess huge databases, that 
characterise the listening habits of their users, and allow 
these users to discover new music. Our analysis consists 
in applying methods from complex network theory ^3 
in order to characterise the musical signatures of a large 
number of individuals. In section 3, we present orig- 
inal percolation idea-based (PIB) methods in order to 
visualise the collective behaviours. They consist in pro- 
jecting the bipartite network, where listeners and mu- 
sic groups are linked, onto a unipartite network, i.e. a 
network where listeners are connected depending on the 
correlations between their music tastes. Let us stress 
that the usual projection method (01, see details be- 
low), used in co-authorship networks for instance, does 
not apply in the networks hereby considered, as it leads 
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to almost fully connected networks. In this work, we 
also project the bipartite network on a network of music 
groups, and probe the reality of the usual music divisions, 
e.g. rock, alternative & punk, classical. We propose a 
quantitative way to define more refined musical subdi- 
visions. These sub-divisions, that arc not based upon 
usual standards but rather upon the intrinsic structure 
of the audience, may lead to the usual music genres in 
some particular case, but also reveal unexpected collec- 
tive listening habits. Let us note that other techniques 
may also lead to an objective classification of music, e.g . 
by characterising their time correlation properties |l7| . 
In general, the identification of a prio ri unknown collec- 
tive behaviours is a difficult task , and of primordial 
importance in the structural and functional properties of 
various networked systems, e.g. proteins ,19] . industrial 
sectors groups of people |2J... Consequently, we 
also use another method in section 4 in order to uncover 
these structures, i.e. the percolated island structure is 
explored by randomly walking (RW) the network, and 
by studying the properties of the RW with standard sta- 
tistical tools. Finally, in section 5, we present a growing 
network model, whose ingredients are very general, i.e. 
personal identification and community imitation (PICI). 
The model reproduces the observed degree (number of 
links per node) distributions of the networks as well as 
its internal correlations. 



II. METHODOLOGY 

Recently new kinds of websites have been dedicated 
to the sharing of musical habits. These sites allow 
first members to upload their music libraries, previously 
stocked on their computers, towards a central server, and 
next to create a web page containing this list of music 
groups. Additionally, the website proposes the users to 
discover new music by comparing their taste with that of 
other users. These methods of making automatic predic- 
tions for the interests of a user by collecting information 
from many (collaborating) users is usually called Col- 
laborative Filtering 22]. The data that we analyse here 
has been downloaded from audioscrobbler.com in January 
2005. It consists of a listing of users (each represented 
by a number), together with the list of music groups the 
users own in their library. This structure directly leads 
to a bipartite network for the whole system. Namely, it 
is a network composed by two kinds of nodes, i.e. the 
persons, called users or listeners in the following, and 
the music groups. The network can be represented by a 
graph with edges running between a group i and a user 
/i, if /i owns i. 

In the original data set, there are 617900 different mu- 
sic groups, although this value is skewed due to mul- 
tiple (even erroneous) ways for a user to characterise 
an artist (e.g. Mozart, Wolfgang Amadeus Mozart and 
Wolfgang Amedeus Mozart count as three music groups) 
and 35916 users. There are 5028580 links in the bipar- 



tite graph, meaning that, on average, each user owns 
140 music groups in his/her library, while each group is 
owned by 8 persons. For completeness, let us note that 
the listener with the most groups possesses 4072 groups 
(0.6% of the total music library) while the group with the 
largest audience, Radiohead, has 10194 users (28% of the 
user community). This asymmetry in the bipartite net- 
work is expected as users have in general specific tastes 
that prevent them from listening to any kind of music, 
while there exist mainstream groups that are listened to 
by a very large audience. This asymmetry is also observ- 
able in the degree distributions for the people and for the 
groups. The former distribution (see FigCJ is fitted re- 
spectively with an exponential e~T5o for large n and the 
latter is a power-law where n is the number of links 

per node, i.e. uq or ul for groups and listeners respec- 
tively. Let us stress that such distributions are common 
in complex networks Il5l|. For instance, co-authorship 
networks also exhibit a bipartite asymmetry, and power 
law distribution n^", with a ^ 2 23j. 

Finally, let us mention the top ten groups in hierarchi- 
cal order: Radiohead, Nirvana, ColdPlay, Metallica, The 
Beatles, Red Hot Chili Peppers, Pink Floyd, Creen Day, 
Weezer and Linkin Park. Obviously, the examined sam- 
ple is oriented toward recent rock music. This fact has to 
be kept in mind, as it determines the mainstream music 
trend in the present sample, and could be a constraint 
on expected universality. This is left for further studies. 

A common way to represent and to study bipartite net- 
works consists in projecting them onto links of one kind 
[l^. The standard projection method simplifies the sys- 
tem to a unipartite network, where nodes are e.g. the lis- 
teners and where two listeners are connected if they have 
at least one music group in common. This scheme, that 
leads to a helpful representation in the case of collabo- 
ration networks [23. l25l | , is unfortunately meaningless in 
the case under study. Indeed, due to the existence of 
mainstream music groups, the unipartite network is al- 
most fully connected, i.e. most of the listeners are linked 
in the unipartite representation. For instance. Radio- 
head fully connects 28% of the user community whatever 
the rest of their music library contents. This projection 
method definitely leads to an oversimplified and useless 
representation. We refine it by focusing on correlations 
between the users libraries. To do so, we define for each 
listener ^ the nc-vector o"'*: 

a'' = (...,l,...,0,...,l,...) (1) 

where nc = 617900 is the total number of groups in 
the system, ^ e [1,35916] and where = 1 \i ^ owns 
group i and erf = otherwise. This vector is used as the 
individual musical signature (IMS), as mentioned in the 
introduction. 

In the following, we make a selection in the total num- 
ber of users for computational reasons. To do so, we have 
analysed a subset of np = 3806 persons having a num- 
ber of groups between [115, 165], -see Fig|31 i.e. around 
the average value 140. In order to quantify the corre- 
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FIG. 1: (a) Histogram of the number of music groups per 
user. The tail is fitted with the exponential e~T5o (dashed 
hue), where n is the number of links per node. A specific 
interval examined in the text is indicated by vertical lines 
: average < no >~ 140, width 50; (b) Histogram for the 
audience size (number of listeners) per music group. This 
distribution behaves like the power-law ~ nj^^'^ 



lations between two persons ^ and A, we introduce the 
symmetric correlation measure: 



cos Of^x 



(2) 



where a^.a^ denotes the scalar product between the two 
7iG-vector, and 1 1 its associated norm. This correlation 
measure, that corresponds to the cosine of the two vectors 
in the nc-dimensional space, vanishes when the persons 
have no common music groups, and is equal to 1 when 
their music libraries are strictly identical. 

At this level, the search for social communities requires 
therefore the analysis of the np x np correlation matrix 
C^^. A first relevant quantity is the distribution of the 
matrix elements C^'^ that statistically characterises the 
correlations between listeners. Empirical results show a 
rapid exponential decrease of the correlation distribution 
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FIG. 2: Probability distribution of the matrix elements. The 
dashed line is the fitted exponential 76"^^^, with 7 — 31. 



(Fig|2Jl, that we fit with Sle"^^*-^, so that people in the 
sample are clearly discriminated by their music taste, i.e. 
they are characterised by non-parrallel vectors. This jus- 
tifies the use of his/her music library as a unique IMS of 
the listener. 



III. PERCOLATION IDEA-BASED FILTERING 

A. Listeners network 

In order to extract communities from the correlation 
matrix C^^, we use the following method. We define 
the filter coefficient cj) G [0, 1[, and filter the matrix el- 



ements so that C' 



1 if Cf"^ > 6, and let C 







otherwise. In figure yj we show the graph representation 
of the filtered matrix for increasing values of cf). For the 
sake of clarity, we have only depicted the individuals that 
are related to at least one person, i.e. lonely persons are 
self-excluded from the network structure, whence from 
any community. One observes that, starting from a fully 
connected network, increasing values of the filtering co- 
efficient remove less correlated links and lead to the for- 
mation of communities. These communities first occur 
through the development of strongly connected compo- 
nents |2^ . that are peninsulas, i.e. portions of the net- 
work that are almost disconnected from the main clus- 
ter, themselves connected by inter-community individu- 
als. Further increasing the filtering coefficient value leads 
to a removal of these inter-community individuals, and 
to the shaping of well-defined islands, completely discon- 
nected from the main island. Let us stress that this sys- 
tematic removal of links is directly related to percolation 
theory. It is therefore interesting to focus on the influ- 
ence of the network structuring along percolation tran- 
sition ideas. To do so, we compare the bifurcation dia- 
gram of the empirical data with that obtained for a ran- 
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FIG. 4: (a) Proportion of nodes in the percolated island as 
a function of the filtering coefficient (j>, for the original listen- 
ers correlation matrix C^"^ and the corresponding randomised 
matrix; (b) Dependence of the clustering coefficient C on the 
filtering coefficient <f). The dashed line, at C = 0.5, is a guide 
for the eye. 



FIG. 3: Graph representation of the listener filtered corre- 
lation matrix for 3 values of the filter parameter — 0.275, 
(j) — 0.325 and (j) — 0.35, displayed from top to bottom. The 
graphs were plotted thanks to the visone graphical tools [2^. 

domised matrix, i.e. a matrix constructed by a random 
re-disposition of the elements C^^. As shown in Fig^, 
the correlated structure of the network broadens the in- 
terval of the transition as compared to the uncorrelated 
case. Moreover, the correlations also seem to displace 
the location of the bifurcation, by requiring more links in 
order to observe the percolation transition. This feature 
may originate from community structuring that restrains 
network exploration as compared to random structures 

As a first approximation, we restrict the scope to the 



formation of islands in the matrix analysis, i.e. to the 
simplest organised structures. From now on, we there- 
fore associate the breaking of an island into sub-islands 
to the emergence of a new sub-community, and, pursu- 
ing the analogy, we call the largest connected structure 
the mainstream community. Before going further, let us 
stress that the projection method described above is ex- 
actly equivalent to that of when ^ = 0. 

In the following, we use a branching representation 
of the community structuring (see Fig|Sl for the sketch 
of three first steps of an arbitrary example). To do so, 
we start the procedure with the lowest value of (j), here 
(j) — 0.2, and we represent each isolated island by a square 
whose surface is proportional to its number of listeners. 
Then, we increase slightly the value of 0, e.g. by 0.01, 
and we repeat the procedure. From one step to the next 
step, we draw a bond between emerging sub-islands and 



FIG. 5: Branching representation of a correlation matrix. 
At each increasing step (t=0,l,2) of the filter 0, links are 
removed, so that the network decomposes into isolated is- 
lands. Theses islands are represented by squares, whose size 
depends on the number of nodes in the island. Starting from 
the largest island, branches indicate a parent relation between 
the islands. The increasing filter method is applied until all 
links are removed. 

their parent island. The filter is increased until all bonds 
between nodes are eroded (that is, there is only one node 
left in each island). Applied to the above correlation ma- 
trix C^^ (Figinilj this representation leads to a compact 
description of the series of graphs as those found in Fig|31 
Moreover, the snake structure gives some insight into the 
diversification process by following branches from their 
source toward their extremity. The longer a given branch 
is followed, the more likely it is forming a well-defined 
community. 

In order to focus on collective effects, we have stud- 
ied in detail the behaviour of the clustering coefficient 
[l6l |. that is a measure of the density of triangles in a 
network, a triangle being formed every time two of one 
node's neighbours are related between them. This quan- 
tity is a common way to measure social effects in complex 
networks, and measures, roughly speaking, whether the 
friend of a friend is a friend. In figure ^p, we plot the 
dependence of this quantity C vs. the filtering coefRcient 
(j>. Moreover, in order to highlight the effects of corre- 
lations, we compare the results with those obtained for 
the above randomised matrix. Our analysis shows a very 
high value of C, almost (j) independent for the original 
matrix. This suggests that the way people acquire their 
music taste is a highly social mechanism, likely related 
to its identification role as described in the introduction. 



B. A Typical individual Music Signature 

Before focusing on the genre-fication of music groups, 
we give here, as an empirical example, the music library 
of one person. This list is intended to indicate the di- 




FIG. 6: In the upper figure, branching representation of the 
listener C'^ correlation matrix. The filtering, with param- 
eter ranging from 0.2 to 0.5 (from bottom to top) induces 
a snake of squares at each filtering level. The shape of the 
snake as well as its direction are irrelevant. In the lower fig- 
ure, branching representation of the music groups correlation 
matrix, the filtering parameter ranging from 0.3 to 0.6 (from 
top to bottom). 



versity of groups that characterise a listener, as well as 
his/her community. We write in bold the music groups 
that are common to his/her sub-community, found by 
the PIB technique, and in normal characters those that 
are owned only by the individual. There are 117 different 
music groups. 



6 



Music library: Air+ New Order+ Jane's Addiction+ DJ 
Krush+ Massive Attack -f DJ Shadow-f Beastie Boys-f Or- 
bital+ Blur+ Pixies+ Leftfield+ Sonic Youth-f David Bowie + 
Primus+ Jeff Buckleys The Smiths+ Daft Punk+ Joy Divi- 
sion+ Smashing Pumpkins-f Chemical Brothers^ Korn+ 
Eminem+ Nirvana+ Radiohead+ Grandaddy-f Travis+ Oa- 
sis + PJ Harvey + Manic Street Preachers + Roots Manuva+ 
Unkle+ Linkin Park+ Atari Teenage Riot+ Kula Shaker+ The 
Police+ James Iha+ Semisonic+ Weezer+ Anastacia+ Rob 
Dougan+ Eels+ Fatboy Slim+ Green Day+ Lostprophets + 
System of a Down+ U.N.K.L.E.+ EI-P+ Bee Gees+ Du- 
ran Duran+ Therapy?+ The Prodigy+ Foo Fighters+ JJ72 + 
Alkaline Trio+ The BeatlesT"- Incubus+ Prodigy+ Muse-f 
And You Will Know Us By The Trai+ Jimmy Eat World + 
Ash+ Rival Schools+ Cher+ At The Drive-In+ Johnny Cash+ 
Mansun+ Queens of the Stone Age+ Basement Jaxx+ Dave 
Matthevus Band+ Dj Tiesto+ Cast+ The Strokes+ Anthrax+ 
Ian Brov)n+ Saves The Day+ Morrissey+ Police+ Modest 
Mouse+ Interpol+ St Germain+ The Beach Boysi"- Bonnie 
Tyler+ Theme+ Fenix*TX+ Snow Patrol+ The Cooper Tem- 
ple Clause+ Buddy Holly+ Nada Surf-f onelinedraviing+ Michael 
Kamen+ Remy Zero+ Ernie Cline+ Quicksand+ Olivia New- 
ton John-I- Polar + Ikara Colt-f- Keiichi Suzuki-f- Rivers Cuomo-I- 
Paddy Casey-f- Billy Talent + Mireille Mathieu-h Jack Dee-f- To- 
moyasu Hotei-h Daniel O'Donnell-f- Hope Of The States-I- Franz 
Ferdinand+ The Shadows-!- THE STILLS-h The RZA+ The 
Mamas and the Papas-f- Melissa Auf Der Maur-f- Barron Knights-f- 
The Killers+ R.E.M.+ Jay-Z DJ Danger Mouse+ Pras 
Michel Feat ODB and Maya-h The Monks Of Roscrea 

Obviously, this person belongs to a music community 
characterised by a mixture of the usual music genres, in- 
cluding Pop/Rock, 80's Pop, Electro, Alternative... This 
eclecticism indicates the inadequacy of such music subdi- 
visions to characterise individual and collective listening 
habits. 



C. Music groups network 

In view of the above, it is interesting to introduce a 
new way to build music sub-divisions, i.e. based upon 
the listening habits of their audience. To do so, we have 
applied the PIB approach to a sample composed of the 
top 5,000 most-owned groups. This limited choice was 
motivated by the possibility to identify these groups at 
first sight. Each music group is characterised by its sig- 
nature, that is a vector: 

r = (...,!,.. .,0,. ..,!,...) (3) 

of riL components, where hil = 35916 is the total number 
of users in the system, and where 7*^ = 1 if the listener 
/i owns group i and 7^ = otherwise. By doing so, we 
consider that the audience of a music group, i.e. the list 
of persons listening to it, identifies its signature, as we 
assume that the music library characterises that of an 
individual. 

The next step consists in projecting the bipartite net- 
work onto a unipartite network of music groups. To do so. 



we build the correlation matrix for the music groups as 
before, and filter it with increasing values of the filtering 
coefficient. As previously, the action of filtering erodes 
the nodes, thereby revealing a structured percolated is- 
land (Fig. [7|) that breaks into small islands. The result- 
ing tree representation of the correlation matrix (Fig. 
shows clearly long persisting branches, thereby suggest- 
ing a high-degree of common listenership. Poring over the 
branches of the top 5000 tree |23, we find many stan- 
dard, homogenous style groupings. Amongst these ho- 
mogeneous cliques, there are [George Strait, Faith Hill, Garth 
Brooks, Clint Black, Kenny Chesney, Shania Twain, Alan Jack- 
son, Martina McBride, Alabama, Tim McGraw, Reba McEntire, 
Diamond Rio, John Michael Montgomery, SheDaisy, Brooks and 
Dunn, Clay Walker, Rascal Flatts, Lonestar, Brad Paisley, Keith 
Urban], [Kylie Minogue, Dannii Minogue, Sophie Ellis Bex- 
tor], [Serge Gainsbourg, Noir Desir], [Billie Holiday, Glenn 
Miller, Benny Goodman], [Morrissey, Faith No More, Ma- 
chine Head, The Smiths, Rammstein, Smashing Pumpkins, Slip- 
knot, Tomahawk, Mr. Bungle], that are Country, dance pop, 
geographically localised i.e. France, swing jazz and rock 
groupings respectively. 

In contrast, many of the islands are harder to ex- 
plain from a standard genre-fication point of view. In 
some cases, the island may be homogeneous in one music 
style, but show some unexpected elements, like: [Spain In 

My Heart (Various), The Pogues, Dave Brubeck Quartet, Crosby, 
Stills, Nash and Young, Phil Ochs, Billy Bragg, Clem Snide, Sarah 
Harmer, Mason Jennings, Kirsty MacColl, tullycraft, Ibrahim 
Ferrer, Sarah Slean, Penguin Cafe Orchestra, Pretenders, Joe 
Strummer and The Mescaleros, Freezepop] that is a folk/folk 

cluster, with odd members like the Brubeck Jazz Band, 
for example. But other groupings defy monolithic style 

categorisation, like: [The Jon Spencer Blues Explosion, Yello, 
Galaxie 500, Prince and the Revolution, Ultra Bra, Uriah Heep, 
Laurent Gamier], [Crosby, Stills, Nash and Young, Orb, Zero 

7, Royksopp, Stan Getz]. The latter include unexpected 
mixtures of Indie Rock/Funk/Hard Rock/Dance, and 
Folk/Electro/ Jazz respectively. 

Consequently, the PIB approach reveals evidence of 
unexpected collective listening habits, thereby uncover- 
ing trends in music. As a matter of fact, these anoma- 
lous entities have been shared by multiple listeners. This 
seems to confirm the role of collective listening habits in 
the individual building of music taste. It is important 
to note that the PIB method neglects the relevance of 
the main island structuring by identifying "music gen- 
res" /"listener communities" with isolated islands. It is 
obviously a drastic simplification that may lead to the 
neglect of pertinent structures, and therefore requests a 
more detailed exploration of the network structure. 



IV. RANDOM WALK EXPLORATION 

In this section, we consider an alternative method for 
revealing the internal structures of the network. The 
method is based on a random walking exploration of the 
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FIG. 7: In the upper figure, typical percolated island of music 
groups for = 0.45. It is composed of 247 nodes and 4406 
links. In the lower figure, zoom on a small structure of the 
percolated island, that is obviously composed of gmtar heroes, 
e.g. B.B. King, S.R. Vaughan, A.D. Meola... Let us also note 
that S.R. Vaughan appears through two different ways that 
are linked by our analysis. 



percolated island. The random walk (RW) starts at some 
node, i.e. the initial node. At each time step, we choose 
randomly one of its links, and move the walker to the 
connected node. Moreover, we keep track of the distance 
from the occupied node to the original node d^i as a func- 
tion of time i. By definition, the distance between two 
nodes is the length of the smallest path between them. 
The initial node is chosen to be the central node of the 
percolated island, namely the node c that minimises the 
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Time steps 

FIG. 8: (a) Time evolution of the distance to the initial 
node during the RW on the network of Fig.7. The network 
exploration exhibits clearly the passage from one cluster to 
another, followed by long stopovers in the latter cluster, (b) 
Risk function R{t) of the signal. The dashed lines are guides 
for the eye, and represent the exponential relaxations e"''^^" 
and e-*/2200. 

average distance with other nodes in the island: 

(n/ - 1) tl^ 

where n/ is the number of nodes in the island. 

In the following, we focus on the percolated island of 
figure 13 that is composed of nj = 247 nodes, and 4406 
links. The percolated island clearly exhibits penininsu- 
lae, that link alike music groups. For instance, the cluster 
in the centre of the figure is "hard rock" oriented, with 
music groups like The killing tree, Unearth, Murder by 
Death... This is also illustrated in the lower graph of 
figure 7, where we zoom on a small structure that en- 
compasses guitar heroes, e.g. B.B. King, S.R. Vaughan, 
A.D. Meola, G. Benson... In the case under study, the 
central node is the music group Murder by Death, that is 
located in the hard rock cluster. 
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FIG. 9: Simulation results of PICIM for parameters given 
and explained in the text : (a) distribution of the number of 
music groups per user ; (b) distribution of the audience per 
group. The dashed line is the power-law ~ n~^'^. 



The resulting time series (FiglHt), that is directly re- 
lated to the subjacent path geometry, seems to indicate 
the existence of different time-scales, associated with the 
large-scale structures in the network. In order to analyse 
the time series, we have focused on the probability of re- 
turn toward the initial node. To do so, we have measured 
the time intervals r between two passages of the walker 
on the initial node, and calculated the distribution /(r) 
of these time intervals. Moreover, in order to study the 
rare events associated to the tail of the distribution, we 
focus on the risk function R(t) — f{T)dT. By con- 
struction i?(0) = 1 and R{oo) — 0. The results, that 
are plotted in figure ^p, clearly reveal two time scales: 
a rapid time scale (80 time steps) that determines the 
internal dynamics in one cluster; a slow time scale (2200 
time steps) that characterises the passage from one clus- 
ter to another one. Let us stress that detrended fluctua- 
tion analysis of the random walk [s^l leads to the same 
conclusion. 



V. PERSONAL IDENTIFICATION - 
COMMUNITY IMITATION MODEL 

The empirical results of the previous section suggest 
that a person's musical taste derives from an interplay 
between Personal Identification, i.e. his/her individual 
choice, and Community Imitation, i.e. the collective 
trend. In order to test this assumption, we introduce the 
PICI model where personal music libraries build through 
two processes. On one hand, collective effects, originating 
from social interactions between individuals, are mim- 
icked by an exchange of music groups between similar 
individuals. In order to define this similitude between 
two persons, we compare their music libraries, and favour 
the pair interactions between people having alike music 
taste, as in a Potts model [sJl. On the other hand, there 
are individual mechanisms that push people to distin- 
guish themselves from their community. We model such 
a dynamics by individual random choices. We neglect 
the effect of an external field, like advertising, on an in- 
dividual behaviour. Moreover, in order to reproduce the 
observed degree distributions of the bipartite graph 2;|| , 
we assume that the networks are growing in time. This 
is done in a way that music groups are chosen with pref- 
erential attachment p^. i.e. with a probability simply 
proportional to their audience. 

These requirements are put into form as follows. The 
system is composed by L{t) users and M{t) music groups, 
that are initially randomly linked. At each (Monte Carlo) 
time step, three processes may occur: 

(i) A new user may enter the system, with probability 
Pi. His/her library contains one music group, chosen 
randomly in the set of previous groups with preferential 
attachment. 

(ii) A randomly picked user adds a new music group to 
the library, with probability . This new group appends 
to the collection of available music in the system. 

(iii) Two randomly chosen users exchange their mu- 
sic knowledge, with probability ps- The pair is selected 

(cob e^x-i) 

with a probability proportional to e r , where 0fj_\ 
is the angle between the vectors of their music libraries 
(Eq^, defined by their cosine (Eq|2J); the temperature 
T is a parameter that represents the ability of qualita- 
tively different communities to mix together. If the pair 
is selected, we compare the two music libraries, and give 
to each user a fraction of the unknown groups of his/her 
partner. Let us stress that this rule ensures preferential 
attachment for the music groups. 

Some representative results of the simulations obtained 
from the model are selectively shown in FiglHl for a typ- 
ical simulation set, with pi — 0.02, pN — 0.03, pE — 0.03 
and T = 0.13. A complete analysis of the PICI model 
phase space variables and the dynamics will be presented 
elsewhere. The simulations were stopped after 200 time 
steps/node, in a system composed by 22800 users, 15126 
music groups and 442666 links. 

The degree distributions of the bipartite graph are de- 
picted in Figl^l The results reproduce quite well the 
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exponential and the power-law features experimentally 
found (Fig^. For the group distribution, the exponent 
is close to the empirical value 1.8. Moreover, different 
simulations show that this value remains in the vicinity 
of 2 for a large set of parameter values. 

For the user distribution, simulations also reproduce 
the deviations from the exponential for small number 
of groups na, as observed in FigQ] We have noticed 
(unshown) that these deviations diminish for increasing 
values of T. This uncovers that the self-organising mech- 
anisms associated to community structuring are respon- 
sible for the extreme deviations. 

The dependence of the clustering coefficient C on the 
filtering coefficient has also been considered. It is found 
that the the simulations reproduce qualitatively well the 
almost constant high value of C found in Fig^ How- 
ever this behaviour ceases to be observed for large values 
of the temperature, i.e. in systems where collective ef- 
fects do not develop by construction of the model. This 
seems to confirm the crucial competing roles played by 
individual choices and community influence in order to 
reproduce the observed data. 

VI. CONCLUSION 

In this article, we study empirically the musical be- 
haviours of a large sample of persons. Our analysis is 
based on complex network techniques, and leads to the 
uncovering of individual and collective trends from the 
data. To do so, we use two methods. On one hand, we use 
percolation idea-based techniques that consist in filtering 
correlation matrices, i.e. correlations between the lis- 
teners/music groups. Moreover, the communities/music 
genres are visualised by a branching representation. On 
the other hand, we explore the structure of the main 
percolated island by randomly walking the network. The 
goal is to map its internal structure and correlations onto 
a time series, that we analyse with standard statistical 
tools. 

The method allows to reveal non-trivial connections 
between the listeners/music groups. It is shown that if 
some empirical sub-divisions respect the standard genre 
classification, many sub-divisions are harder to explain 
from a standard genre-fication point of view. These col- 
lective listening habits, that do not fit the neat usual 



genres defined by the music industry, represent the 
non-conventional taste of listeners. They could there- 
fore be an alternative objective way to classify music 
groups. These collective genre-hopping habits also sug- 
gest a growing eclecticism of music listeners that is 
driven by curiosity and self-identification, in opposition 
to the uniform trends promoted by commercial radios 
and Major record labels 32]. 

We would like to point that the above methods should 
help finding and visualising structures in a large variety 
of networks, e.g. the detection/classification of trends 
in marketing, show business and financial markets. Ap- 
plications should also be considered in taxonomy |33j |. 
in scientometrics, i.e. how to classify scientific papers 
depending on their authors, journal, year, keywords..., 
and in linguistics |3^ . From a more theoretical point of 
view, this work is closely related to the theory of hid- 
den variables i.e. the hidden variables being 
here some intrinsic property of the music groups (38iJ , and 
should provide an empirical test for this theory. 

Whence, we introduce a simple grow model, that repro- 
duces quite well the results obtained from the empirical 
data, i.e. the observed degree distributions of the net- 
works. It is important to point out that the ingredients 
of the model are very general, i.e. imply competition be- 
tween personal identification and community imitation 
(PICI). Consequently, PICI should apply to a larger va- 
riety of systems than the music networks hereby investi- 
gated, but also to other networks such as collaboration 
networks in science |2^. In a statistical physics sense, the 
model contains Potts model-like ingredients for opinion 
and taste formation. 
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